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Abstract Game-based environments frequently afford students the opportunity to exert 
agency over their learning paths by making various choices within the environment. The 
combination of log data from these systems and dynamic methodologies may serve as a 
stealth means to assess how students behave (i.e., deterministic or random) within these 
learning environments. The current work captures variations in students’ behavior patterns 
by employing two dynamic analyses to classify students’ sequences of choices within an 
adaptive learning environment. Random Walk analyses and Hurst exponents were used to 
classify students’ interaction patterns as random or deterministic. Forty high school 
students interacted with the game-based system, iSTART-ME, for 11 -sessions (pretest, 8 
training sessions, posttest, and a delayed retention test). Analyses revealed that students 
who interacted in a more deterministic manner also generated higher quality self- 
explanations during training sessions. The results point toward the potential for dynamic 
analyses such as random walk analyses and Hurst exponents to provide stealth assess- 
ments of students’ learning behaviors while engaged within a game-based environment. 

Keywords Game-based intelligent tutoring systems • Dynamic analyses • Stealth 
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Game-based Intelligent Tutoring Systems (ITSs) are computer-based learning environ- 
ments that provide students with pedagogical instruction within the context of a game 
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(Van Eck 2007). Game-based ITSs can be situated in a variety of domains such as 
science (Johnson-Glenberg et al. 2011; Sabourin et al. 2012), mathematics (Rai and 
Beck 2012). and technology education (van Eck 2006). One key feature of these 
environments is that they often afford students the opportunity to exert agency over 
their learning path by allowing for multiple methods and trajectories of play (King and 
Cazessus 2014; Sabourin et al. 2012; Schmierbach et al. 2012; Teng 2010). This 
inevitably leads to students interacting with and experiencing the game-based environ- 
ment differently. For example, examining the ways in which students behave when they 
are afforded this agency can lead to a better understanding of optimal and non-optimal 
behaviors within a learning environment (Sabourin et al. 2012). The inclusion of 
agentic features (e.g., choose your own path or edit an avatar) has been associated 
with increased immersion, motivation, and positive learning gains (Cordova and 
Lepper 1996; Schmierbach et al. 2012; Teng 2010). 

Although variations in students’ behaviors may prove to be invaluable information 
for researchers, these behaviors are often difficult to measure and quantify. Tradition- 
ally, scientists have used self-report measures as proxies to gauge students’ actions and 
behaviors during learning tasks (Rosenbaum 1980; Zimmerman and Schunk 1989, 
2001; Zimmerman 1990). While informative, traditional self-report measures that 
assess students’ behaviors and intentions during learning tasks may not fully or 
adequately capture their target construct (Hadwin et al. 2007; Zhou 2013). Indeed, an 
overarching concern regarding self-reports is the frequent mismatch between students’ 
reports of what they do and observations of their actual performance (McNamara 
2011). The mismatch between self-reports and behavior may arise from a number of 
factors. First, self-report relies on the student’s memory for past events and behaviors, 
and these memories can be inconsistent and unreliable. Second, the student may lack a 
clear understanding of what comprises good and poor performance, leading to over or 
under estimations of various traits. Third, the behaviors, cognitive states, and 
affect can be difficult to observe because they are often not verbal in nature, 
and thus the student may not be conscious of these behaviors, and those 
behaviors also may not be evident to an observer. Finally, and perhaps fore- 
most, learning strategies and behaviors are dynamic (Hadwin et al. 2007; Lord 
et al. 2010). Students often behave and learn differently depending on the 
domain, context, and task. Learning behaviors dynamically fluctuate between 
contexts and tasks and they also fluctuate within tasks as comprehension and 
learning develop and change over time. Hence, static measures may not ade- 
quately capture nuanced changes in how learners modulate and change their 
behaviors across varying goals and task demands. 


Online Measures 

Online measures offer an alternative means of capturing the dynamic nature of learning 
behaviors (Hadwin et al. 2007; Ventura and Shute 2013; Winne and Hadwin 2013; 
Zhou 2013). In contrast to offline self-report and post assessments, online measures 
capture behaviors from the learner in real-time. These measures capture nuanced 
patterns in students’ behaviors and thus may be more likely to capture how students 
exert agency while engaging in learning tasks. 
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Within automated learning environments, online measures such as log data can act as 
a form of stealth assessment by unobtmsively capturing variations in students’ behaviors 
(Shute et al. 2009; Shute 2011). Log data, also referred to as keystroke, mouse click, 
click stream, or telemetry data (depending on the context), is essentially the recording of 
all of a user’s interactions or keystrokes while interacting with an automated system. 
Notably, the collection of log data is not built into all computerized systems, but rather 
must be intentionally programmed. When it is collected, log data can provide a wealth of 
information, particularly concerning students’ choices and agency while engaged with a 
system (Hadwin et al. 2007; Sabourin et al. 2012; Schulte-Mecklenbeck et al. 2011; Shih 
et al. 2010; Snow et al. 2014). 

For instance, Hadwin and colleagues (Hadwin et al. 2007) utilized users’ log data 
from the gStudy system to create profiles of students’ self-regulatory behaviors. The 
gStudy system is a web-based platform designed to investigate students’ annotation 
(e.g., highlight, label, or classify) of educational content. Hadwin and colleagues 
examined how students’ patterns of annotation and study habits informed profiles of 
SRL. They demonstrated that log data informed profiles of self-regulated behaviors by 
revealing fine-grained behavioral patterns that students exhibited while studying. 
Hadwin and colleagues argue that these nuanced patterns would have been missed 
by self-report measures alone. 

Similarly, Sabourin and colleagues (Sabourin et al. 2012) examined how log data 
from the narrative-centered environment, Crystal Island, was indicative of students’ 
strategy use (e.g., self-monitoring and goal setting). Sabourin et al. investigated how 
students’ behaviors during game-play (e.g., use of notes, books, or in-game tests) and 
pretest self-report measures of affect and prior knowledge combined to predict stu- 
dents’ level (i.e., low, medium, or high) of strategy use. They found that the inclusion of 
system log data significantly contributed to the classification of students’ use of 
metacognitive strategies. Such research demonstrates that log data extracted from 
adaptive environments yield unique and unobtrusive means to examine the ways in 
which individuals behave during learning tasks, and these behaviors are important 
indicators of individual differences that contribute to learning outcomes. 


Dynamic Systems Theory 

In conjunction with log data, dynamic systems theory and its associated analysis 
techniques offer researchers a unique means of characterizing patterns that emerge 
from students’ behaviors within an adaptive system. Such an approach treats time as a 
critical variable in addressing patterns of stability and change. Dynamic analyses focus 
on the complex and sometimes fluid interactions that occur within a given environment 
rather than treating behavior as static (i.e., unchanging), as is customary in many 
statistical approaches. 

Dynamic methodologies have been utilized in adaptive systems to investigate the 
complex patterns that emerge in students’ behaviors (Hadwin et al. 2007; Snow et al. 
2013; Soller and Lesgold 2003; Zhou 2013). For example, Snow et al. (2013). used 
random walk algorithms to visualize how individual differences influenced students’ 
trajectories within a game-based environment. Results from that study revealed that 
students’ trajectories within a game-based environment varied as a function of 
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individual differences in students’ reading comprehension ability. Snow and colleagues 
argue that choice patterns that manifest within students’ log data are likely to be 
overlooked using more traditional (e.g., static) statistical analyses, and that dynamic 
analyses offer a readily available means to capture this cmcial source of information. 

Such research affords scientists a dynamical perspective of students’ behaviors 
within adaptive environments; however, it reveals little information about how students 
regulate or control their choices. The current work utilizes two dynamic methodologies, 
random walks and Hurst exponents, to visualize and classify how patterns in students’ 
behaviors manifest over time and relate to learning gains. Random walks are mathe- 
matical tools that provide a graphical representation of a path or trajectory (Benhamou 
and Bovet 1989). Thus, random walks afford researchers the opportunity to visualize 
fine-grained patterns that form in categorical data across time. This technique has been 
used in a variety of domains, such as economics (Nelson and Plosser 1982). ecology 
(Benhamou and Bovet 1989). psychology (Collins and De Luca 1994). and genetics 
(Lobry 1996). For instance, geneticists have utilized these visualization tools to exam- 
ine distinct patterns of disease and coupling in gene sequences (Ameodo et al. 1995; 
Lobry 1996). More recently, learning scientists have utilized this technique to visualize 
how interaction trajectories within adaptive systems vary as a function of individual 
differences (Snow et al. 2013). 

While random walk analyses generate visualizations of unique patterns across time, 
Hurst exponents (Hurst 1951) classify the tendency of those patterns. Hurst exponents 
characterize statistical changes in time series by revealing persistent, random, and 
antipersistent behavioral trends (Mandelbrot 1982). When fluctuations in patterns are 
positively correlated from one moment to the next, they are exhibiting a persistent (i.e., 
deterministic) quality. Time series fluctuations exhibiting deterministic tendencies are 
assumed to reflect self-organized and controlled processes (Van Orden et al. 2003). By 
contrast, when each moment in a time series is independent of every other moment, the 
fluctuations in the times series are exhibiting random characteristics. Time series that 
exhibit random processes reflect a breakdown in system functioning and control (e.g., 
Peng et al. 1995). Finally, when time series fluctuations are negatively correlated from 
one moment to the next, they are exhibiting antipersistent behavior (Collins and De 
Luca 1994). Time series fluctuations exhibiting antipersistent behaviors are assumed to 
be demonstrating corrective processes (Collins and De Luca 1994). 

The goal of the current study is to investigate how variations in students’ behaviors 
manifest across time, and ultimately impact daily learning outcomes within a game- 
based system. Random walks and Hurst exponents are used to capture the fine-grained 
behavior patterns that manifest within students’ log data collected across multiple 
sessions within a complex learning environment. Ultimately, the combination of log 
data and dynamic techniques may serve as novel forms of stealth assessment, exam- 
ining students’ propensity to act in deterministic (or random) manners across time, and 
without relying on obtrusive survey methodologies. 


iSTART-ME 

The context of the current study is the game-based learning environment, iSTART-ME 
(Interactive Strategy Training for Active Reading and Thinking-Motivationally- 
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Enhanced; Jackson and McNamara 2013). This system provides students with instruc- 
tion on the use of self-explanation and comprehension strategies (Jackson et al. 2012; 
Jackson and McNamara 2013). iSTART-ME is an ideal environment to examine how 
patterns manifest in students’ choices across time because it requires multiple sessions 
to complete; it includes multiple modules and students choose their individual paths 
within the environment. Hence, it affords an environment in which students have 
agency over their learning paths and objectives. 

iSTART-ME is based on a traditional intelligent tutoring system, iSTART 
(McNamara et al. 2004) but integrates games and game-based features to enhance 
students’ motivation, engagement, and persistence over time (Jackson et al. 2009; 
Jackson and McNamara 2013). The game-based features in iSTART-ME were incor- 
porated within iSTART following research emphasizing the importance of factors 
related to motivation such as students’ self-efficacy, engagement, self-regulation, and 
interest (Alexander et al. 1997; Bandura 2000; Pajares 1996; Pintrich 2000; 
Zimmerman and Schunk 2001). Previous work has revealed that when game-based 
features are embedded within iSTART-ME, students report an increase in engagement 
and motivation across multiple training sessions (Jackson and McNamara 2013). The 
current study takes this work a step farther by examining how students interact with 
these game-based features incorporated within the system interface. 

Both iSTART and iSTART-ME introduce, demonstrate, and provide students with 
practice using self-explanation reading strategies for complex science texts. This is 
accomplished in three separate modules referred to as introduction, demonstration and 
practice (see Jackson et al. 2009). The game-based practice within iSTART-ME is 
referred to as extended practice. In this interface, students can choose to read and self- 
explain new texts, personalize characters, play mini-games, earn points, purchase 
rewards, and advance levels through the use of an embedded selection menu (see 
Fig. 1). Additionally, within this selection menu, students can view their current level 
and the number of skill points and trophies earned. 



Fig. 1 Screenshot of the iSTART-ME selection menu 
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In the extended practice interface, students can choose to generate their own self- 
explanations within three different practice environments: Coached Practice, Map 
Conquest, and Showdown. These environments afford students the opportunity to 
engage in strategy practice and receive feedback on the quality of their self-explana- 
tions. Coached Practice is a non-game based method of practice adapted from the 
original iSTART system. In this environment, a pedagogical agent guides practice and 
provides students with formative feedback on their generated self-explanations. In 
contrast, Showdown and Map Conquest are both game-based practice environments. 
In Showdown, students compete against a computer player by generating self- 
explanations in an attempt to win points (see Fig. 2). In Map Conquest, students 
generate self-explanations to earn dice, which are used to conquer squares on a map 
(see Fig. 3). As students engage with texts in these practice environments, they can earn 
points that allow them to progress through a series of levels ranging from 0 to 25. Each 
level requires more points to proceed than the previous level; thus, students must exert 
more effort as they advance to higher levels in the system. 

Students’ points also serve as a form of currency (iBucks) that can be used to unlock 
game-based features within the system. There are two primary uses for iBucks: 
interacting with personalizable features and playing identification mini-games. 
Personalizable features were implemented into the system as a means to enhance 
students’ personal investment and sense of control over their learning environ- 
ment. Within iSTART-ME, students have three personalizable feature options: 
changing the background theme, customizing an avatar, and editing a pedagog- 
ical buddy. Students can also use their iBucks to interact with identification 


Self Explanation SHOWDOWN! 


Player 1 (You) 

Sentences Won: 

1 


iSTART Points: 

50 


Player 2 


Sentences Won: 




Diagnosis of Genetic Disorders 

Once it is clearly understood where a gene is located and the gene's DNA sequence is known, a 
diagnosis of a genetic disorder may be made before birth. The DIMA of people with and without the 




disorder is analyzed for common patterns that may be associated with the disorder. 


this says that the dna of people with disorders is mutated, this causes the 
genes to go haywire. 


I know about DNA. DNA is the building block for all the proteins in 
our bodies. It is our genetic map. 



Player 1 wins this point. 


Fig. 2 Screenshot of Showdown 
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Fig. 3 Screenshot of Map Conquest 


mini-games. These mini-games were added to iSTART-ME to provide students 
with opportunities to practice identifying the various self-explanation strategies. 
For instance, in the mini-game Balloon Bust, students are shown a target 
sentence and an example of a self-explanation. They must then decide which 
previously learned strategy was used to generate the example self-explanation 
and pop (by clicking with the computer mouse) the corresponding balloons on 
the screen (see Fig. 4). 



Fig. 4 Screenshot of Balloon Bust 


4^ Springer 







Int J Artif Intell Educ 


Current Study 

In summary, game-based environments afford students multiple methods of interaction 
and play. Log data from these environments can capture variations in these behaviors to 
help scientists decipher various learning patterns. In particular, researchers can apply 
dynamic analyses that focus on the fluid changes in nuanced behavior patterns to gain a 
deeper understanding of how students control their behaviors over time and ultimately, 
the impact those behaviors have on learning outcomes. The current study uses two 
statistical techniques: random walks and Hurst exponents. The combination of these 
two techniques provides a means to visualize and categorize fine-grained patterns in 
students’ behaviors that emerge within system log data across time. The current study 
uses these methodologies to examine two research questions. 

1) Do students demonstrate controlled patterns of interaction (i.e., deterministic) 
within the game-based system iSTART-ME? 

2) How do variations in students’ interaction patterns impact in-system performance, 
posttest, and long-term retention learning outcomes (i.e., self-explanation quality)? 


Method 

Subjects 

The data that are analyzed in this paper were collected as part of a larger laboratory 
study that compared three conditions: iSTART-ME, iSTART-Regular, and a no-tutoring 
control (Jackson and McNamara 2013). Participants in the current study are the subset 
of students from the original study who were assigned to the iSTART-ME condition. 
These participants included 40 high-school students from a mid-western urban envi- 
ronment. The students were, on average, 15.5 years of age, with a mean reported grade 
level of 10. Of the 40 students, 50 % were female, and 17 % were Caucasian, 73 % 
were African-American, and 10 % were other nationalities. All participants were 
monetarily compensated for their participation. 

Procedure 

The study comprised 1 1 sessions within a laboratory experiment that included a pretest, 
8 training sessions, a posttest, and a delayed retention test. During the first session, 
participants completed a pretest survey comprising a battery of measures, including an 
assessment of their prior self-explanation (SE) ability. During sessions 2 through 9, 
students engaged with the iSTART-ME system for approximately 1 h per session. 
Throughout these training sessions, students interacted with the full game-based menu, 
where they could choose to interact with generative practice games, identification mini- 
games, personalizable features, and achievement screens. Students completed a posttest 
during session 10 that included similar measures to the pretest. One week after 
completing the posttest, students returned to the lab for session 11, which consisted 
of a retention test that contained similar measures to the pretest and posttest (e.g., self- 
explanation ability). 
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Measures 

Strategy Performance 

To assess self-explanation quality at pretest, posttest, and retention, students were asked 
to read through a text one sentence at a time and were then prompted to provide a self- 
explanation for approximately 8 to 12 target sentences within each text. Students also 
generated self-explanations during training while interacting with the practice games in 
iSTART-ME. The quality of students’ generated self-explanations was assessed through 
the use of a feedback algorithm that utilizes both latent semantic analysis (LSA; 
Landauer et al. 2007) and word-based measures (McNamara et al. 2007). This algo- 
rithm scores self-explanations on a scale ranging from 0 to 3. A score of “0" is assigned 
to poor self-explanations principally comprised of irrelevant information that is not 
contained in the text. A score of “1" is assigned to self-explanations that relate to the 
target sentence, but lack elaborations that use information from the text or prior 
knowledge (e.g., paraphrases). A score of “2" is assigned when self-explanations 
incorporate information from the text beyond the target sentence (e.g., include text- 
based inferences). Finally, a score of “3" suggests that a student’s self-explanation 
incorporates information from both the text and prior knowledge. The assessment 
accuracy of this algorithm has been shown to be comparable to human ratings across 
a variety of texts (McNamara et al. 2007). Using this algorithm, students’ self- 
explanations were scored at pretest, training, posttest, and retention. Within the current 
study, students’ training self-explanation scores were averaged (across all 8 sessions) to 
create an aggregate score that represented their overall performance within the system. 

System Interactions 

Students interacted freely within iSTART-ME for 8 training sessions. Every choice a 
student made was logged into the system database. We then categorized every interaction 
choice within those raw data files into one of four game-based categories (described 
below). It is important to note that only completed actions were retained for this analysis. 
Thus, if a student opened a game and then exited the game without finishing it (regardless 
of time spent in the game), that interaction would not be counted as a game-played and 
therefore would not be included in the final analyses. The analysis included a total of 
11,120 game-based interactions, with an average of 278 (SD = 33) choices per student. 

As described earlier, students’ interactions with iSTART-ME involved one of four 
types of game-based features, each representing a different type of game-based func- 
tionality within the system: 

1. Generative practice games. iSTART-ME includes three practice environments 
(Coached Practice, Map Conquest, and Showdown) that prompt students to gen- 
erate their own self-explanations within the context of a game. Within the gener- 
ative practice environments students receive feedback concerning their self-expla- 
nations. Thus, generative practice games are designed to provide students with 
opportunities to apply comprehension strategies while reading challenging texts, 
and receive feedback on the quality of their self-explanations. On average, students 
interacted with generative practice games 19.03 times (SD = 7.07). 
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2. Identification mini-games. There are six identification mini-games that reinforce 
the learning strategies and goals presented by asking the students to identify the 
type of strategies used within example self-explanations. These games do not 
prompt students to generate their own self-explanation, but instead provide stu- 
dents with strategy recognition practice. This involves students reading the text and 
an explanation of the text, and then choosing the principal strategy used to generate 
that explanation. On average, students interacted with the identification mini- 
games 24.9 times (SD = 29.17). 

3 . Personalizable features. Students have the opportunity to personalize features within 
the iSTART-ME environment. These customizable options include: editing an avatar, 
customizing the background theme, or changing their pedagogical agent. The 
personalizable features potentially provide a means to enhance students’ engagement 
and afford a feeling of personal investment within the game interface (Jackson and 
McNamara 2013). However, they also potentially distract from the learning process 
because they are unrelated to learning how to better understand challenging text. On 
average, students interacted with personalizable features 6 times (SD = 8.04). 

4. Achievement screens. As students engage with the iSTART-ME system, they can 
earn points, win trophies, and advance to higher achievement levels. Within the 
main interface, students can view their progress in the system by scrolling over 
icons and opening achievement screens. When students choose to view any of 
these progress screens they are engaging with achievement screens. Achievement 
screens were embedded within the system to assess the relation between monitor- 
ing these sources of information about performance and the learning outcomes. For 
example, if students are able to track their progress within a system, they in turn 
may become more personally invested in their performance. Alternatively, tracking 
this information may distract students from the learning process. On average, 
students interacted with achievement screens 45.45 times (SD = 36.09). 

Tracking the use of these four distinct features of iSTART-ME using log data 
collected during the study affords the means to investigate patterns in students’ choices 
across and within each type of interaction. 

Quantitative Methods 

To examine variations in students’ behavior patterns within iSTART-ME, random walk 
analyses and Hurst exponents were calculated. Surrogate analyses were conducted to 
validate the interpretability of Hurst exponents. Linear regressions were calculated to assess 
how students’ behavior patterns influenced learning outcomes. The following section 
provides a description and explanation of random walk, Hurst, and surrogate analyses. 

Random Walk Analyses 

Random walk analyses were used in this study to visualize students’ interaction 
patterns with iSTART-ME by examining the sequential order of students’ interactions 
(i.e., choice of game-based feature) with the four types of game-based features (i.e., 
generative practice games, identification mini-games, personalizable features, and 
achievement screens). Each of these feature types was assigned to an orthogonal vector 
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on an XY scatter plot: generative practice games (-1,0), identification mini-games 
(0,1), personalizable features (1,0), and achievement screens (0,-1). These locations are 
random and are not associated with any qualitative value associated with the activity. 

Each student’s unique walk was traced by first placing an imaginary particle at the 
origin (0,0). Every time that a student interacted with one of the four feature categories, 
the particle moved in a manner consistent with the vector assignment (see Table 1 for 
axis directional assignment). The use of these vectors allows us to define the movements 
that students make within the system. In the current study, vectors do not represent 
positive or negative dimensions; they simply provide a space on the grid to track users’ 
pattern of movements. Thus, the directionality of the axes can vary as long as they are 
consistent throughout the entire analyses. In the current work, the axis direction 
assignment was set prior to the analysis and remained consistent for every student. 

Figure 5 illustrates what a random walk might look like for a student with five 
interactions. The starting point for all the walk sequences is (0,0); this is where the horizontal 
and vertical axes intersect (see # 0 in Fig. 5). In this example, the first interaction that the 
student engaged in was a mini-game; so, the particle moves one unit up along the Y-axis (see 
# 1 in Fig. 5). The second interaction in which the student engaged was with a generative 
practice game, which moves the particle one unit left along the X-axis (see # 2 in Fig. 5). The 
student’s third interaction was with an achievement screen, which moves the particle one 
unit down along the Y-axis (see # 3 in Fig. 5). The fourth hypothetical interaction was with 
another achievement screen, which again moves the particle one unit down along the Y-axis 
(see # 4 in Fig. 5). Finally, for the fifth and final particle move, the student interacted with a 
personalizable feature, which moves the particle one unit to the right along the X-axis (see 
#5 in Fig. 5). These simple mles were utilized for every interaction a student made within 
iSTART-ME. This analysis resulted in a unique walk for each of the 40 students. 

Figure 6a and b illustrate two random walks that were generated using students’ log 
data. In Fig. 6a (random walk on the left), the generated random walk reveals that this 
particular student interacted most frequently with the generative practice games. This 
walk trajectory is primarily anchored along the generative practice axis. Conversely, the 
student who generated the random walk in Fig. 6b (random walk on the right) 
interacted most frequently with both generative practice games and identification 
mini-games. This is demonstrated by the trajectory of their walk, as it hovers between 
the generative practice games and identification mini-games axes. These two contrast- 
ing figures demonstrate how log data can be used to generate a unique spatial 
representation of each student’s time in the system. Random walks provide a visuali- 
zation of students’ interaction paths within the iSTART-ME system. It is important to 
note that this technique can be used on any number of categorical variables. In the 
current study, all walks lay on an XY axis; however, these flexible visualization tools 
can be used on an unlimited number of dimensions and vectors. 


Table 1 System interaction 
choices and corresponding axis 

System Interaction Choice 

Axis Direction Assignment 

direction assignment 

Generative Practice Games 

-1 on X-axis (move left) 


Identification Mini-Games 

+1 on Y-axis (move up) 


Personalizable Features 

+1 on X-Axis (move right) 


Achievement Screens 

-1 on Y-axis (move down) 
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2 


3 


1 

Generative Practxe Games 

4 


Pcrsonall/Jhlc Features 


5 


\(hir%rmrnl Screens 

Fig. 5 Example random walk with five interaction choices 


Although random walks provide an illustration of students’ movements within a 
game-based environment, they do not provide a quantification of these patterns. Thus, 
to quantify patterns of movements from these walks, distance time series were con- 
structed for each student by calculating a measure of Euclidean distance for each step in 
the walk. Distance was calculated from the origin to each step with equation (1) where 
y represents the particles place on the y-axis, x represents the particles place on the x- 
axis, and i represents the zth step in the walk: 

Distance = ^ (jj-J'o) 2 + (*«-*o) 2 (1) 


Hurst Exponents 

To classify the tendency of students’ interaction patterns based on the distance time 
series analyses, Hurst exponents were calculated using Detrended Fluctuation Analysis 



Personalizable Features Generative Practice Games 


Identification Mini-Games 


Personali/able Features 


Achievement Screens 


Fig. 6 a and b. Random walks generated from two different students’ log data 
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a 




Fig. 7 Illustration of the first (a) and second (b) iteration of the second step of the DFA procedure for a single time 
series. In both figures, vertical lines represent the binning procedure that occurs during the second step of DFA 


(DFA; Peng et al. 1994). DFA is a method for estimating persistence (i.e., deterministic 
tendencies) in a time series by determining how a measure of variance depends on scale 
size (Peng et al. 1994). The DFA algorithm is captured by the following equation: 


F (n) = 


( 2 ) 


where N is the total number of observations, y(k ) is the At h observation, y n (k) is the 
predicted value of y(k) from a local trend line, and n is the window size for a given 
scale. More concretely, the DFA algorithm involves four simple steps. The first step is 
to create the profile by subtracting the mean from the time series and then taking the 
cumulative sum (i.e., integrating). The second step involves dividing a series of length, 
A, into N/n non-overlapping bins, such that each bin contains n observations. The third 
step is to compute the root-mean-square residual across all bins (Fig. 7a). The residual 
is obtained by subtracting local trend lines within each bin, this process is repeated for 
several ns, decreasing n by a power of 2 (Fig. 7b). The maximum n should be N/2, and 
the result is a fluctuation function, F(n). The fourth and final step is to regress log 2 F(n) 
on \og 2 n (Fig. 8). In the case of persistence, the expected result from the final step is a 
linear slope, a, greater than 0.5. 

The above steps are depicted in Figs. 7 and 8. Figure 7 shows an example time series 
for a single student’s interaction trajectory, where each interaction is taken as the 
analogue of a unit of time. The precedence for using observations in that manner can 
be found throughout the literature (e.g., Peng et al. 1994; Van Orden et al. 2003). 
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m 



log 2 Scale 


Fig. 8 Depiction of the fourth and final step in detrended fluctuation analysis. The equation represents the 
regression procedure, and the slope indicates the Hurst exponent 


Figure 7 also depicts two of the detrending steps that give DFA its name. Figure 7a 
shows the first of those steps: a regression line is fit to each of the demarcated bins. The 
fitted lines are subtracted from the time series to obtain the residuals used to compute 
F(n) at a given scale. Figure 7b displays the subsequent iteration by reducing the 
window sizes by exactly one-half and repeating the fitting and detrending procedure. 
Once F(n ) has been obtained for all n , the regression analysis depicted in Fig. 8 is 
conducted. That is, the base two logarithm of F{n) is regressed onto the base two 
logarithm of Scale (i.e., n). The resulting slope is the Hurst exponent. The interpretive 
index for Hurst is as follows: 0.5 < H < 1 indicates persistent (deterministic or 
controlled) behavior, H = 0.5 signifies random behavior, and 0 < H < 0.5 denotes 
antipersistent behavior. 

Within the current work, Hurst exponents are used to quantify changes in students’ 
interaction patterns (i.e., distance time series). We are specifically interested in whether 
students’ choice patterns reveal deterministic (i.e., controlled) or random (i.e., indepen- 
dent) tendencies. Hurst exponents act as long-term correlations; therefore they provide 
a metric about how an entire interaction pattern changes and manifests across time. In 
the current work, deterministic interaction patterns are assumed to reflect self-organized 
and controlled processes (Van Orden et al. 2003). By contrast, interaction patterns that 
exhibit random tendencies reflect a breakdown in system functioning and control (e.g., 
Peng et al. 1995). Finally, when interaction patterns exhibit antipersistent tendencies, 
they are acting as corrective (i.e., negatively correlated) processes (Collins and De Luca 
1994). Using this classification, the Hurst exponent affords us the opportunity to 
examine how controlled students are when they interact with the four types of game- 
based features embedded within iSTART-ME (i.e., generative practice games, identifi- 
cation mini-games, personalizable features, and achievement screens). 

Surrogate Analysis 

Surrogate analysis is an important step in time series analysis when using brief time 
series, as in the present case (Theiler et al. 1992). Clearly, the DFA procedure outlined 
above could be applied to any time series and result in a Hurst exponent. What is 
needed, though, is a means to determine whether the observed exponent accurately 
represents the underlying process. Surrogate analysis fills that need by providing a 
principled, statistical means to distinguish time series generated by random processes 
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from time series generated by deterministic processes. The general approach of surro- 
gate analysis is to compare an observed measure — like the Hurst exponent — to similar 
measures derived from randomly shuffled surrogate data (Theiler et al. 1992). The idea 
is that the analyzed time series may be a random process that merely appears to exhibit 
persistent- or antipersistent-like behavior over a short interval. If so, then randomly 
shuffling the series should not affect the scaling structure. If not, and the scaling 
behavior is genuine, then shuffling the time series should deteriorate the scaling 
structure. Surrogate analysis tests those hypotheses. 

In the current context, the surrogate analysis tests the null hypothesis that the 
observed Hurst exponent is an artifact of short series length. We implemented the 
surrogate analysis by shuffling each time series 40 times and then by performing a DFA 
on each of the shuffled series. We then compared the average surrogate derived Hurst 
exponent for each time series with its observed Hurst exponent counterpart using a 
paired samples t-test. The test revealed that shuffled surrogates produced smaller Hurst 
exponents than did the intact series, £(39) = 156.90, p < 0.001, supporting the 
conclusion that the observed Hurst exponents accurately represent the observed pat- 
terns of persistence. 


Results 

Hurst Exponents and Surrogate Analyses 

To characterize how students interacted with the system over the course of the 8 
training sessions, Hurst exponents were calculated using DFA and students’ distance 
time series derived from the individual random walks. Hurst exponents sug- 
gested that students varied considerably from weakly (some random tendencies) 
to strongly persistent (range = 0.57 to 1.00, M= 0.77, SD = 0.11). A surrogate 
analysis was then conducted to assess the reliability of the Hurst exponents. 
The surrogate analysis revealed that Hurst exponents derived from intact series 
differed from those calculated on shuffled time series, £(39) = 156.90, 
p < 0.001, suggesting that the Hurst exponents characterizing students’ interac- 
tion patterns were reliable. 

High and Low Hurst Student Examples 

Within the current study, students’ Hurst exponents ranged considerably. Indeed, some 
students acted in a weakly persistent manner, while others were more deterministic in 
their choice patterns. Figure 9 illustrates how two students (one low Hurst and one high 
Hurst) differed from each other in terms of percentage of interactions with each type of 
game-based feature. In Fig. 9, the low Hurst student (Hurst of 0.60) demonstrated more 
variation in the interaction pattern, where no one feature was favored. Conversely, a 
high Hurst student (Hurst of 0.90) interacted primarily with the generative practice 
games. While these are only two examples of the difference between high and low 
Hurst students, the general notion is that low Hurst students acted more impetuously 
and jumped around more frequently, whereas the high Hurst students acted in more 
controlled and deterministic manners. 
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Interaction Choices 

Figure 9 demonstrates how high and low Hurst students’ choices of interactions varied 
between two students. However, to examine relations between Hurst exponents (i.e., as 
a measure of deterministic or random tendencies) and students’ frequency of interaction 
choices, we conducted Pearson correlations. Results from this analysis revealed that 
students’ Hurst exponents were not significantly related to students’ frequency of 
interactions with generative practice games (r = .25, p = .11), identification mini- 
games (r = -.12, p = .45), personalizable features (r = -.06, p = .70), or achievement 
screen views (r = -.24, p = .13). These results indicate that students’ interaction 
patterns (i.e., Hurst exponents) within the system were not related to any specific 
feature. Thus, Hurst exponents are not capturing what game-based features students 
interact with, but rather how they interact with these features. 

Learning Outcomes 

Using Pearson correlations, we measured the strength of the relation between the Hurst 
exponents (i.e., as a measure of deterministic or random tendencies) and students’ self- 
explanation scores during training (in-system performance), as well as at posttest and 
retention test scores (see Table 2). Results from this analysis revealed that students’ 
Hurst exponents were significantly related to their average self-explanation scores 
during training (r = .51 ,P < .001) and at retention (r = 31, p = .05). However, there 
was no relation between students’ Hurst exponents and their self-explanation scores at 
posttest (r = .09 ,p = .59). These results are consistent with previous work showing that 
for generative activities, such as self-explanation, the impact of training is not observed 
immediately after training (at posttest). Rather, the effects of training are more likely to 
be apparent after a delay (e.g., Adams et al. 2014; Dunlosky et al. 2013; Schmidt and 
Bjork 1992). Overall, the results from this analysis indicate that when students’ 
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Fig. 9 Two sample students’ percentage of game-based interactions as a function of Hurst score, where high 
Hurst is indicative of more deterministic behavior patterns 
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interaction patterns within the system reveal more deterministic properties, they gen- 
erated higher quality self-explanations during training and at retention. 

To further investigate how interaction patterns impacted daily learning outcomes 
(i.e., training self-explanation scores), we used a linear regression model to factor out 
students’ pretest self-explanation scores. In model one of this analysis, we used pretest 
self-explanation scores to predict daily training self-explanation scores. Results from 
this analysis revealed that the pretest self-explanation score was a significant predictor 
of students’ daily self-explanation scores ( R 2 = .26, F(l,38) = 13.60, p < .01; see 
Table 3). In model two, we examined the degree to which students’ Hurst exponents 
predicted daily self-explanation scores over and above the pretest self-explanation 
score. Results from this analysis indicated that Hurst exponents were a significant 
predictor of daily self-explanation scores over and above the pretest self-explanation 
score ( R 2 = .44, F(l,37) = 11.93 , p < .01; see Table 3). This analysis demonstrates that 
students’ Hurst exponents accounted for 18 % of the additional variance in students’ 
daily self-explanation quality over and above the pretest self-explanation score. 

A similar linear regression model was conducted to investigate the degree to which 
interaction patterns impacted performance at the retention test over and above students’ 
pretest self-explanation score. In model one of this analysis, we used pretest self- 
explanation scores to predict retention self-explanation quality. Results from this initial 
analysis demonstrated that pretest self-explanation quality was a significant predictor of 
students’ retention self-explanation quality (R 2 = .22, F(l,38) * 10.67,/? < .01; see 
Table 4). In model two, we examined the degree to which students’ Hurst exponents 
predicted their retention self-explanation quality over and above pretest self-explanation 
quality. Results from this analysis indicated that students’ Hurst exponent did not 
significantly predict the quality of their retention self-explanation quality over and above 
pretest self-explanation quality (R 2 = .27, 741,37) = 2.70,/? = .10; see Table 4). 


Discussion 


Game-based environments frequently afford students opportunities to exert agency 
over their learning path. A predominant assumption by many researchers and educators 
is that students’ ability to control their behaviors during learning has a positive and 
important impact on their academic success (Hadwin et al. 2007; Sabourin et al. 2012; 
Zimmerman 1990). However, assessing variations in these behaviors can be difficult, 
and traditionally has relied upon self-report measures. A common concern about this 
methodology is that self-report measures do not adequately capture the fine-grained 
changes that occur in students’ behaviors over time. Hence, nuanced and dynamic 
measures are needed to gain a deeper understanding of students’ ability to control their 


Table 2 Correlations between 
self-explanation scores and Hurst 
exponents 


Strategy Performance 

r 

Training self-explanation score 

0.51** 

Posttest self-explanation score 

0.09 

Retention self-explanation score 

0.31* 

;? = .05*,/?<.001** 
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Table 3 Linear regression analyses predicting daily self-explanation quality 

Variable 

B 

SE 

0 

AR 2 

Model 1 




.26** 

Pretest self-explanation 

.41 

.11 

51** 


Model 2 




.18** 

Pretest self-explanation 

.35 

.10 

.43** 


Hurst Exponent 

1.89 

.55 

.43** 



*p < .05; ** p < .01 


behaviors (Hadwin et al. 2007). Log data has been previously used to analyze varia- 
tions in students’ learning behaviors at more fine-grained levels (Hadwin et al. 2007; 
Sabourin et al. 2012; Snow et al. 2014). The work presented here builds upon these 
findings by conducting dynamic analyses of system log data to investigate the extent to 
which students’ behaviors exhibit deterministic or random properties. These initial 
analyses explore how dynamic techniques can potentially act as a form of stealth 
assessment within systems, such as iSTART-ME. Such assessments have a strong 
potential to deepen our understanding of the relations between learning outcomes and 
sequences in students’ behaviors within adaptive environments. 

The current study made use of novel methodologies by employing random walk and 
Hurst exponent analyses in an attempt to capture each student’s unique interaction pattern 
within iSTART-ME. Past research using Hurst exponents points to the use of this scaling 
variable as an indicator of the degree to which students’ interaction patterns are controlled 
and deterministic (Mandelbrot 1982; Van Orden et al. 2003). Specifically, when students’ 
interaction patterns have deterministic tendencies, it may be indicative that they are 
exhibiting persistent and controlled behavior patterns. Conversely, when students’ inter- 
action patterns are weakly persistent (indicating random tendencies), it may be indicative 
that they are not behaving with purpose, control, or persistence. These tendencies across 
long periods of time may reveal trends in how students approach learning tasks. Therefore, 
this work begins to shed light upon the dynamic nature of learning behaviors that students 
exhibit while interacting within a game-based environments. 

Results from the current study fall in line with previous work that has shown that 
students’ ability to control and regulate their learning behaviors has a positive impact on 
learning outcomes (Butler and Winne 1995; Pintrich and De Groot 1990; Zimmerman and 


Table 4 Linear regression analyses predicting retention self-explanation quality outcomes 


Variable 

B 

SE 

0 

AR 2 

Model 1 




22 ** 

Pretest self-explanation 

.47 

.14 

47 ** 


Model 2 




.05 

Pretest self-explanation 

.43 

.61 

43 ** 


Hurst Exponent 

1.28 

.77 

.23 



*p < .05; ** p < .01 
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Schunk 1989; Zimmerman 1990). Specifically, we found a significant positive relation 
between controlled patterns of interactions (i.e., Hurst scores) and self-explanation quality 
assessed during training and at the retention test (though not with performance at posttest). 
These results are consistent with previous work showing that when students engage in 
generative activities (e.g., self-explanation), the effects on learning are often not apparent 
immediately after training at posttest, but rather emerge more strongly at delayed retention 
tests (e.g., Adams et al. 2014; Dunlosky et al. 2013; Schmidt and Bjork 1992). 

Overall, these results suggest that when students are given more control over their 
environment there are some potential consequences, at least for in-system performance. 
This may be especially important within game-based environments, as they often offer 
students numerous opportunities to control their trajectory within the system (King and 
Cazessus 2014; Sabourin et al. 2012; Teng 2010). It is also important to note that within 
iSTART-ME, it seems to be critical for students to engage within the system in a more 
deterministic way. However, such agency may or may not be appropriate depending on 
the learning goals embedded within the environment. In other game-based systems, it 
may be more pmdent for students to explore the system interface more frequently, thus 
revealing a more impetuous behavior pattern. As such, the findings presented here are 
meant to provide evidence that students’ behaviors within game-based environments 
are linked to learning outcomes and should be measured by researchers when they 
evaluate the effectiveness of their respective system. 

This study serves as a starting point for scientists to apply dynamic techniques to 
system log data as a way to trace and classify students’ interactions. These analyses are 
intended as a seed for future studies by providing evidence that dynamic methodologies 
show strong promise in providing online stealth indicators of controlled behaviors. The 
analyses in this study focused on students’ movements between four game-based 
features. However, dynamic methodologies are flexible and the only notable limitation 
of these methods is that they must be applied to temporal data. Thus, dynamic 
techniques are generalizable to almost any type of time-stamped log data from any 
type of system with any number of actions (i.e., choices or behaviors). 

One limitation of the current study is that to reliably calculate Hurst exponents, 
numerous data points are needed. In the current study, over 1 1 ,000 game-based inter- 
actions were captured, with each student averaging over 275 choices. Thus, as can be 
imagined, replication of such an in-depth data set is difficult. One way to counter this 
problem is to use another measure of order or disorder that requires fewer temporal data 
points. For instance, an Entropy calculation can be used to calculate order and disorder 
with far fewer data points than the Hurst (Snow et al. 2015). Thus, although Hurst 
exponents require numerous data points, alternative dynamic methodologies may also 
capture the degree to which patterns are ordered versus random while using less data. 

Another limitation of the current work is that we did not include self-report measures 
of self-regulation or related constmcts. While we hypothesize that students who act in a 
more deterministic manner are exerting control and may be self-regulating, the results 
presented here do not support such an assertion. Accordingly, the next steps in this 
research agenda would necessarily include further confirmatory studies demonstrating 
concurrent validity. For example, an obvious extension of the current work will be to 
include self-report measures that have been traditionally used to assess controlled 
behaviors and constructs such as self-regulation. The outcomes of such assessments 
can then be compared to those provided by dynamic assessments. Notably, however, 
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the results of such studies may be inconclusive given that the one supposition of the 
current research is that self-report measures are fundamentally flawed. Recently, 
researchers have begun to move away from the use of self-reports and instead have 
relied on game-play or performance data to measure constructs such as persistence and 
deterministic behavior (DiCerbo 2014; Ventura and Shute 2013; Ventura et al. 2013). 
Thus, the horizon of future research seems to be pointing toward the establishment of 
the respective utility of using static versus dynamic assessments of learning behaviors. 

Another future direction of the current study regards the practical use of this 
approach. Ultimately, the purpose of using dynamic measures is to capture variations 
in learning behaviors in real time. Hence, the tme test lies in the implementation of 
these measures within adaptive learning environments to evaluate their utility in those 
contexts. For example, one crucial question for future research regards the use of 
visualization and dynamic techniques as a means to unobtmsively assess students’ 
behavior patterns. Such analyses will be especially valuable if systems are able to 
recognize non-optimal patterns and steer students toward more effective behaviors. For 
instance, if a student is engaging in a random interaction loop, it may be beneficial for 
adaptive learning environments to have the capability to recognize these patterns and 
prompt the student toward a more deterministic trajectory. 

In conclusion, this study explored the use of two dynamic methodologies to unobtm- 
sively assess deterministic behaviors and their impact on learning within a game-based 
environment. These analyses are among the first attempts to examine variations in 
students’ log data to capture tendencies in online behaviors and subsequent interaction 
patterns across time. Student models rely on understanding the relation between students’ 
abilities and performance. We expect tracking and modeling interaction trends over time to 
be cmcial to improving adaptivity within systems that provide students with agency over 
the environment. Overall, these findings afford researchers the opportunity to understand 
the dynamic nature of learning behaviors and their impact on student learning. 
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